lmcinnes's Repositories
59 repositories
ann-benchmarks
Benchmarks of approximate nearest neighbor libraries in Python
⭐ 2
🌐 Public
apricot
apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html
⭐ 0
🌐 Public
bayesian_tsne
A flexible Bayesian approach to t-SNE dimension reduction.
⭐ 6
🌐 Public
BERTopic
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
⭐ 1
🌐 Public
bokeh
Interactive Web Plotting for Python
⭐ 2
🌐 Public
conda-forge-pinning-feedstock
A conda-smithy repository for conda-forge-pinning.
⭐ 0
🌐 Public
conf
A collection of talks and tutorials from conferences I attend
⭐ 1
🌐 Public
datamapplot_examples
Hosting examples of interactive datamapplot output
⭐ 28
🌐 Public
datashader
Quickly and accurately render even the largest data.
⭐ 2
🌐 Public
embetter
just a bunch of useful embeddings
⭐ 2
🌐 Public
enstop
Ensemble topic modelling with pLSA
⭐ 114
🌐 Public
fidgit
An ungodly union of GitHub and Figshare
⭐ 1
🌐 Public
geomstats
Computations and statistics on manifolds with geometric structures.
⭐ 1
🌐 Public
glasbey
Algorithmically create or extend categorical colour palettes
⭐ 227
🌐 Public
hdbscan
A high performance implementation of HDBSCAN clustering. http://hdbscan.readthedocs.io/en/latest/
⭐ 101
🌐 Public
hdbscan-feedstock
A conda-smithy repository for hdbscan.
⭐ 1
🌐 Public
hdbscan_paper
Source files and notebooks for a paper on accelerating HDBSCAN*
⭐ 35
🌐 Public
hypergraph
A library for hypergraphs and hypergraph algorithms
⭐ 28
🌐 Public
hyperspy
Multidimensional data analysis
⭐ 3
🌐 Public
hypertools
A python toolbox for gaining geometric insights into high-dimensional data
⭐ 1
🌐 Public
jupyter-tutorial
"The world of Jupyter"—a tutorial
⭐ 2
🌐 Public
kepler-mapper
KeplerMapper is a Python class for visualization of high-dimensional data and 3-D point cloud data.
⭐ 4
🌐 Public
kereru
A density based clustering library
⭐ 8
🌐 Public
kmodes
Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data
⭐ 3
🌐 Public
LargeVis
No description
⭐ 4
🌐 Public
llm_ai_eo
Language Models on the AI Executive Order — Deploying text embeddings and Llama2 to answer questions about the Executive Order on Artificial Intelligence issued by President Biden
⭐ 1
🌐 Public
LOD_text_layer
A deck.gl composite layer providing level of detail text support
⭐ 1
🌐 Public
mpl-third-party
Third-party Packages Webpage
⭐ 0
🌐 Public
param_docs
Handle Param sphinx docstrings
⭐ 0
🌐 Public
penguins
A great intro dataset for data exploration & visualization (alternative to iris).
⭐ 1
🌐 Public
persistence_wasserstein_benchmarking
Tools for benchmarking implementations of Wassersteing-Kantorovich distance between persistence diagrams
⭐ 0
🌐 Public
pomegranate
Fast, flexible and easy to use probabilistic modelling in Python.
⭐ 0
🌐 Public
pynndescent
A Python nearest neighbor descent for approximate nearest neighbors
⭐ 956
🌐 Public
pynndescent-feedstock
A conda-smithy repository for pynndescent.
⭐ 0
🌐 Public
python3statement.github.io
No description
⭐ 1
🌐 Public
readthedocs.org
source code to readthedocs.org
⭐ 1
🌐 Public
sandbox-topically
Topic modeling helpers using managed language models from Cohere. Name text clusters using large GPT models.
⭐ 1
🌐 Public
scanpy
Single-Cell Analysis in Python. Scales to >1M cells.
⭐ 0
🌐 Public
scikit-learn
scikit-learn: machine learning in Python
⭐ 3
🌐 Public
scipy
Scipy library main repository
⭐ 1
🌐 Public
scipy-2017-sklearn
Scipy 2017 scikit-learn tutorial by Alex Gramfort and Andreas Mueller
⭐ 5
🌐 Public
scipy2018-jupyterlab-tutorial
Tutorial material and instruction for scipy 2018 jupyterlab tutorial
⭐ 1
🌐 Public
seaborn
Statistical data visualization using matplotlib
⭐ 1
🌐 Public
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
⭐ 0
🌐 Public
Slides-SciPyConf-2018
A repository for public storage of slides given at the 17th Python in Science Conferences (2018)
⭐ 2
🌐 Public
sstsne
Semi-Supervised t-SNE using a Bayesian prior based on partial labelling
⭐ 42
🌐 Public
staged-recipes
A place to submit conda recipes before they become fully fledged conda-forge feedstocks
⭐ 1
🌐 Public
subreddit_mapping
Notebooks and data associated to constructing and exploring a map of subreddits.
⭐ 55
🌐 Public
tdc
Topological Density-based Clustering
⭐ 8
🌐 Public
text-clustering
No description
⭐ 3
🌐 Public
TextMAP
Tools for word and document embedding using UMAP
⭐ 3
🌐 Public
thisnotthat
A visual labeling system implemented in Jupyter widgets.
⭐ 11
🌐 Public
topic_mapping
No description
⭐ 7
🌐 Public
umap
Uniform Manifold Approximation and Projection
⭐ 8035
🌐 Public
umap_doc_notebooks
Notebooks used to generate some of the UMAP documentation
⭐ 3
🌐 Public
umap_paper_notebooks
Notebooks in support of the UMAP paper
⭐ 44
🌐 Public
vectorizers
Vectorizers for a range of different data types
⭐ 2
🌐 Public
vectorizers_playground
Using the TIMC Document Vectorizers library
⭐ 0
🌐 Public
vibe
Vector Index Benchmark for Embeddings (VIBE) is an extensible benchmark for approximate nearest neighbor search methods, or vector indexes, using modern embedding datasets.
⭐ 1
🌐 Public